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ANNUAL  REPORT  ON  CONTRACT  N00014-88-K-0322 
PRINCIPAL  INVESTIGATOR:  Shalom  R.  Rackovsky 
CONTRACTOR:  University  of  Rochester 

CONTRACT  TITLE:  Quantitative  Classification  of  Known  Protein  Structures 
START  DATE:  1  July,  1988 

RESEARCH  OBJECTIVE:  To  develop  mathematical  methods  for  comparing  and  classi¬ 
fying  the  structures  of  proteins  of  arbitrary  molecular  weight;  to  develop  a  data  base  of 
protein  x-ray  structures  representative  of  the  entire  set  of  known  protein  structures;  to 
apply  the  methods  developed  to  the  data  base,  in  order  to  understand  the  structural  re¬ 
lationships  between  the  known  protein  structures;  to  develop  new  insight  into  aspects  of 
protein  folding  and  evolutionary  relationships  based  on  the  results  of  the  comparison 
and  classification  studies. 

PROGRESS  (Year  1):  Since  ONR  funding  began  on  July  1,  rapid  progress  has  been 
made  on  several  of  the  stated  research  objectives.  A  set  of  mathematical  tools  has 
been  developed,  using  methods  of  graph,  matrix  and  classification  theories,  which  make 
it  possible  to  analyze  very  large  structural  data  bases  rapidly,  accurately  and  easily. 
These  have  been  implemented  computationally.  A  data  base  of  1 23  protein  x-ray  struc¬ 
tures  has  been  assembled  from  the  Brookhaven  Protein  Data  Bank,  spanning  the  entire 
range  of  known  protein  structures.  This  set  of  proteins  has  been  studied  on  the  4-,  5- 
and  6-aipha-carbon  length  scales,  with  respect  to  both  global  organization  of  the  protei- 
nome  (i.e.  the  relationships  between  large  groups  of  proteins)  and  local  organization 
(i.e.  definition  of  structures  which  are  similar  to  any  given  protein).  Observations  on  this 
data  base  confirm  some  of  our  earlier  findings,  made  on  a  much  smaller  data  base 
using  less  sophisticated  methods  of  analysis,  on  structural  relationships  as  a  function  of 
length  scale  and  on  convergent  evolution  of  folding  mechanisms.  Some  unsuspected 
structural  relationships  have  come  to  light.  For  example,  it  has  been  demonstrated  that 

there  are  two  distinct  classes  of  ^-sheet/barrel  proteins-  one  with  mainly  flat  extended 
structures,  and  one  with  mainly  twisted  (right-  or  left-handed)  extended  structures. 
Evidence  is  also  beginning  to  emerge  for  the  existence  of  three  classes  of  mainly  helical 
proteins.  It  has  been  shown  that  the  representational  methods  developed  are  able  not 
only  to  demonstrate  the  similarity  of  related  proteins,  but  to  detect  anomalies  arising 
from  the  presence  in  the  data  base  of  proteins  with  less  accurately  determined  struc- 


tures.  It  has  further  been  shown  that  these  anomalous  relationships  disappear  when  the 
analysis  is  carried  out  at  lower  resolution,  as  befits  low-quality  structures. 


WORK  PLAN  (Year  2):  It  is  planned  to  extend  the  analysis  to  longer  length  scales, 
where  one  expects  that  structural  relationships  will  become  even  more  strongly  defined. 
In  addition,  it  is  planned  to  study  the  effect  of  resolution  of  the  representation,  as  this  af¬ 
fects  the  results  of  the  clustering  studies  which  define  the  structure  of  the  proteinome.  It 
is  also  hoped  to  carry  out  studies  of  non-sequential  backbone  structures,  i.e.  those  de¬ 
fined  by  virtual  bonds  which  do  not  connect  successive  alpha  carbons.  Some  of  these, 
such  as  the  crossover  connection,  have  been  shown  to  be  highly  characteristic  features 
of  protein  structure.  The  methods  we  have  developed  make  it  possible  to  carry  out  a 
complete,  quantitative  census  of  such  structures  for  the  first  time,  and  this  can  yield  po¬ 
tentially  useful,  previously  inaccessible  data.  It  is  anticipated  that  a  start  will  be  made 
this  year  in  connecting  the  masses  of  data  resulting  from  our  analyses  with  the  funda¬ 
mental  folding  and  evolutionary  questions  of  interest.  It  is  also  planned  to  develop  meth¬ 
ods,  based  on  our  mathematical  tools,  for  structural  alignment  of  proteins  and  for  the 
rapid  search  of  the  complete  data  base  for  fragments  similar  to  a  chosen  structure. 

INVENTIONS:  No  inventions  have  resuHed  from  this  work. 

PUBLICATIONS  AND  REPORTS;  A  paper  is  currently  in  preparation  which  will  detail 
the  results  of  our  studies  to  date.  In  addition,  a  Progress  Report  was  submitted  to  the 
ONR  Distribution  List,  as  required,  by  June  1, 1989. 

TRAINING  ACTIVITIES:  No  students  have  been  associated  with  this  project  this  year.  A 
graduate  student  will  be  joining  the  project  on  October  1, 1989. 

AWARDS/FELLOWSHIPS:  None. 
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